Submitted to the Annals of Statistics 
arXiv: math. PR/0000000 



NOMINAL ASSOCIATION VECTOR AND MATRIX 

By Wenxue Huang, Yong Shi and Xiaogang Wang 

Shantou University, Chinese Academy of Science and York University 
When response variables are nominal and populations are cross- 
classified with respect to multiple polytomies, questions often arise 
about the degree of association of the responses with explanatory 
variables. When populations are known, we introduce a nominal as- 
sociation vector and matrix to evaluate the dependence of a response 
variable with an explanatory variable. These measures provide de- 
tailed evaluations of nominal associations at both local and global 
levels. We also define a general class of global association measures 
which embraces the well known association measure by Goodman- 
Kruskal (1954). The proposed association matrix also gives rise to 
the expected generalized confusion matrix in classification. The hier- 
archy of equivalence relations denned by the association vector and 
matrix are also shown. 

1. Introduction. Many studies in biology, psychology, sociology and 
text mining deal with nominal dependent response variable with categori- 
cal explanatory variables. When a parametric models, such as a logistic or 
log-linear model, are employed, a standard statistical analysis can be per- 
formed to determine the significance of the explanatory variable. Agresti 
([1]) and Fienberg ([3]) provide excellent accounts of parametric methods 
for analyzing categorical data. 



AMS 2000 subject classifications: Primary 62H20 

Keywords and phrases: Association matrix, association measures, association vector, 
categorical data, equivalence relation, generalized confusion matrix, the Goodman-Kruskal 



1 



2 W. HUANG, R. SHI AND X. WANG 

When the number of covariates is large, a direct employment of a predic- 
tive model could encounter serious computational difficulties. To reduce the 
dimensionality, one must first select a collection of highly relevant covari- 
ates with a much smaller dimensionality. Therefore an accurate evaluation 
of any existing association among categorical variables becomes crucial for 
analyzing high dimensional categorical data. A measure of association for 
categorical variables is referred to as nominal if any possible scale or order 
of the categories of variables is not of interest. There are several nominal 
association measures available in the literature. Goodman and Krusal ([7]) 
argued that many such measures of association for nominal data stem from 
the standard chi-square statistic upon which a test of independence is usually 
based. They also argued that the class of measures based on chi-square lacks 
of interpretability. They considered alternative measures based on propor- 
tional predictions. The method of proportional prediction has been widely 
used in clinic diagnosis, inventory management, risk management and social 
studies. Goodman ([5] and [6]) provided a thorough review of analysis of 
cross-classified data and also presented a general method for cross-classified 
data without a response variable. 

When the response and explanatory variables are both nominal, the method 
by Goodman and Krusal ([7]) can be further generalized (see Costner [2], 
and Sarndal [11] ). A measure of association of the response variable Y with 
the explanatory variable X can be defined as 

(1.1) r(Y\X) = [V(Y)-V(Y\X)]/V(Y), 

where V(Y) represents a measure of uncertainty in Y without knowledge of 
X, and V(Y"|X) symbolizes the uncertainty in Y when X is known. 

The entropy and Gini concentration are most widely used variance mea- 



1 INTRODUCTION 3 

sures for nominal data. Entropy has been widely used in information theory. 
The Gini concentration has been widely used in statistical analysis of cate- 
gorical data and economics to measure inequality. Detailed discussions about 
the entropy and Gini concentrations can be found in Lloyd [8]. 
The Gini concentration as the measure of variability is defined by 

n x 

(1.2) V G (X) := J>(X = i)(l-p(X = i)). 

i=l 

where nx represents the number of classes of X. 

Combining equations (1.1) and (1.2), the association measure defined by 
(1.1) reduces to Goodman and Krusal's r (the GK r), that is 

£ £ P(Y = i; X = j) 2 /P(X = j)-E P(Y = if 

(1.3) = ^ ^ , 

1 - E P ( Y = i) 2 

i=i 

where nx and ny represents the number of classes for the response variable 
Y and covariate X, respectively. The GK r is actually equivalent to the 
conditional Gini index. 

We introduce an association vector which measures association at each 
local response category with an explanatory variable. This vector also pro- 
vides an expected local accuracy lift rates for proportional prediction. A 
general class of global association degree is also introduced based on a con- 
vex combination of local association measures in the association vector. The 
coefficients can be chosen according to the objective of the inference. Many 
measures of association can be derived from the proposed global measure 
by using different sets coefficients. Moreover, the proposed global measure 
coincides with the GK r when the response variable is dichotomous or if 
the weights are set to be some function of the marginal probabilities of the 
response variable. 



4 



W. HUANG, R. SHI AND X. WANG 



We also propose an association matrix to estimate the generalized confu- 
sion matrix before an actual classification. It also provides the distribution 
of the first and second type like prediction error rates for proportional pre- 
diction. 

Furthermore, we show that there exists a hierarchy of equivalence relations 
induced by these measures. This hierarchy provides important insights into 
the proposed association measures. It is expected to play a crucial role in 
feature selection and cross classification. 

This paper is organized as follows. In Section 2, we introduce the associa- 
tion measure vector and matrix. We also define a general class of association 
measures by using the proposed association vector. The hierarchy of equiv- 
alence relations induced by the association vector, association matrix, and 
the GK r is shown in Section 3. We illustrate the properties of the propose 
association measures by examining two examples in Section 4. Discussions 
are provided in Section 5. 

2. Association Vector and Matrix. Let X and Y be categorical 
variables with domains Dmn(X) = {1, 2, . . . , nx} and Dmn(Y) = {1, 2, . . . , ny} 
respectively. For any s € Dnm(Y), we assume that P(Y = s) > (thus 



2.1. Association Vector. We introduce an association vector which mea- 
sures the degrees of each response category associated with an explanatory 
variable. 

Definition 2.1. The association vector 



P(Y = s) < 1). 




2 ASSOCIATION VECTOR AND MATRIX 5 
is given by 

(2 1) py-)l* ._ E[P{Y = s\Xn-P{Y = s? 

V •" P(Y = s) (1 - P(Y = s)) ' s-l,Z,...,n Y . 

When a proportional predictive model based on X is deployed, the com- 
ponents of the vector Q Y \ X are exactly the error reduction (or accuracy lift) 
rates for proportional prediction over those using the information of Y only. 

Proposition 2.2. Assume that P(Y = s) > for any s G Dmn{Y). 
Then 

(i) < 6>( y=s )l x < 1; 

(ii) 0( Y=S )\ X = 0, Vs <^=^ Y and X are independent; 

(iii) 0( Y=S )\ X = 1 P(Y = s\X = i) = 1 or 0, for all i G Dmn(X). 

PROOF. One checks that 

V P(X=j,Y=s) 2 _ p( y _ )2 

q{y=s)\x _ ^jeDmn(X) P(X=j) r \ x ~ b ) 
P(Y = s)(l- P(Y = s)) 

,„„x = gj ( P (X = j,Y = s) — P(X = j)P(Y = s)) 2 

{ " ' P(X = j) P(Y = s)(l - P(Y = s)) ~ ' 

and 

E[P(Y = s\X) 2 } =J2P(Y = s\X = j) 2 P(X = j) 

j 

(2.3) <^P(Y = s\X = j)P(X = j) = P(Y = s). 

j 

Thus 9^ Y=S ^ X < 1. The rest follows from (2.2) and (2.3). □ 

2.2. A Class of Global Association Degrees. One might also be interested 
in the overall nominal dependence of a response variable on an explanatory 
variable. Assume that p(Y = s) > for all s = 1, 2, . . . , ray (> 2). We now 
define a general class of measures for associations. 



6 W. HUANG, R. SHI AND X. WANG 

Definition 2.3. Given a weight vector a = {a,\, a.2, ■ ■ ■ , ct nY ) with 
J2s a s = 1 an d "s > for all s = 1, 2, ... , ny, the global association degree 
is defined as 

7ly 

s=l 

We call r a the a-association degree ofFonX. We call a weight vector 
a. regular if a s > for all s = 1,2, ... , ny, in other words, if every single 
scenario of Y makes contribution to the evaluation of the overall nominal 
dependence. Sometimes, however, some scenarios of Y may be "merged" 
with others, e. g. , in the CART decision tree or Value-At-Risk based risk 
analysis. Therefore the weight vector provides the analyst with a mechanism 
to place a desired emphasis on certain scenarios given different inferential 
objectives. In particular, each component of the association vector can be 
reproduced by placing a s = 1 for a given s. 

y I v 

The following properties of T a follows from Proposition 2.2. 

Theorem 2.4. Assume a s > for all s e DmniY). 

(i) < r^ |X < I; 

y\v 

(ii) T a =0 <J=^ Y and X are independent; 

(iii) t^ X = 1 < ^=^ > Y is completely determined by X <^=^ t^^ x = 1. 

Given a population the association degree is a function of a and changes 
accordingly. When employing the proportional prediction model, one can 
produce the GK r from the proposed global association degree. 



Theorem 2.5. // the weight vector is assigned as in the following: 
(2.4) a p = -±—(P(Y = 1)-P(Y = l) 2 , ...,P{Y = n Y )-P{Y = ny) 2 ); 



2 ASSOCIATION VECTOR AND MATRIX 

where V G (Y) = Y, P ( Y = s)(l - P{Y = s)). we then have 



(2.5) r y \ x = r 



Y\X _ Y\X 



Proof. Observe that 

n Y 

RHS = J>(Y = i)(l- P(Y = 0) 9 Y=llX 

71 y 

= ^- ) Y { { E ^ Y = m?)-p{y = i?) 

I n Y n x 

= ^^ ) \^Y^P{Y = r,X = j?/P{X = j)-EP(Y) 

This completes proof. □ 

2.3. Association Matrix. In cross classification with multinominal (in- 
stead of binomial) response variable, the association vector essentially shows 
"correct" classification rates. A complete picture of the cross classification 
including both correct and detailed misclassifications would be more infor- 
mative. This motivates the following definition. 

Definition 2.6. The association matrix is given by 

(2.6) 7 (Y\X) := ( 7 s< mX)), 

where 

..strvi^ ._E[P(Y = s\X) p(Y = t\X)] 



l St (Y\X) := 



P(Y = s) 
where s,t = 1, 2, • • • , ny- 

It can be seen that the association matrix 7(Y|Jf) has the following prop- 
erties: 

(1) j(Y\X) is a row stochastic matrix; 



8 W. HUANG, R. SHI AND X. WANG 

(2) 0( Y = S )\ X is the normalization of 7 ss (y|X); 

(3) 7 st (y|X) = p(Y = t\Y = s,X), where Y is the predicted value of Y 
under the proportional prediction and with X as the predictor. 

If Y is binary and a proportional prediction is deployed, the matrix 
7(y|X) based on the training samples can be explained as the expected 
confusion-like matrix commonly used in classification; while the conditional 
probability matrix oi p(Y\Y) based on actual and predicted responses is a 
confusion-like matrix. Our proposed association matrix provides the distri- 
bution of error rates for proportional prediction. 

In addition, the (s,£)-entry, j st (Y\X), of the association matrix ^y(Y\X), 
can be regarded as the probability of assigning the true value Y = s to Y = t 
when the proportional prediction is deployed. Furthermore, when t ^ s and 
s is fixed, ^ st (Y\X) is a first-type-like error rate while 7* s is the second- 
type-like error. If we compute all these error rates based the relationship 
demonstrated in (Y, Y), it is referred to as the generalized confusion matrix; 

More properties of the matrix and relationship with the proposed associ- 
ation vector will be shown in the next section. 

3. Hierarchy of Equivalence Relations. In this section, we establish 
the hierarchy of equivalence relations defined by association matrix, associ- 
ation vector and a global association degree. Recall that for a point set A, 
a binary relation "~" on the points of A is referred to as an equivalence 
relation if it is 

• self-reflective: x ~ x for all x € A, 

• symmetric: if x ~ y then y ~ x, where x,y € A, and 

• transitive: if x ~ y and y ~ z then x ~ z, where x,y,z € A. 



3 HIERARCHY OF EQUIVALENCE RELATIONS 



9 



Denote the set of explanatory variables by 6. We now present the five 
equivalence relations as follows. 

Definition 3.1. Let X\,X 2 € 6 and a response variable Y. With re- 
spect to Y, the variables X\ and X 2 are 

3.1.1 E-l equivalent, if T X ^ X2 = T x ^ Xl = t y \ Xi = 1; 

3.1.2 E-2 equivalent, if r y l Xl = 1 = T y l Xa ; 

3.1.3 E-3 equivalent, if 7(Y|Xi) = -y{Y\X 2 ); 

3.1.4 E-4 equivalent, if y l Xl = G y l X2 ; 

3.1.5 E-5 equivalent with respect to a weight vector a, if r^'^ 1 = r^'^ 2 . 

Theorem 3.2. ^4ZZ the above defined binary relations E-i, i = 1,2, 3, 4, 5, 
are indeed equivalence relations on the set C. Furthermore, if X\ and X 2 are 
E-i equivalent (with respect to Y), then they are E-(i+l) equivalent (with 
respect to Y ), for i = 1,2, 3, 4. 

Proof. Observe that if X\ and X 2 are E-2 equivalent with respect to Y , 
then j(Y\Xi) = I ny , the identity matrix of degree ny, i = 1,2. 
One checks, 



(3.1) 




P(X = i,Y = s)P(X = i,Y = t) 



Thus we have 



(3.2) 



j ss {Y\X) = (1 - P(Y = s)) 9 



iY=s\X 



+ P{Y = s). 



Notice that we have 



T. 



Y\X 



Y=s\X 



a 



seDmn(Y) 



This completes the proof. 



□ 



10 W. HUANG, R. SHI AND X. WANG 

From equation (3.2), one can see that the association vector and matrix 
are functions of each other. 

Several remarks related to the equivalence relations are in order. 

Remark 3.3. The E-i equivalence relations depend on the choice of re- 
sponse variable Y. We now show that E-i equivalence is strictly stronger 
than E-(i+l) equivalence. We are given categorical variables X\, X 2 and Y 
in a given data set S. 



a. Considering the following data set S, 



Y 


1 








1 




1 


2 


3 


4 


x 2 


2 


3 


1 


2 


probability 


2/7 


2/7 


2/7 


1/7 



Then we see that X U X 2 ,Y satisfy 3.1.2 but not 3.1.1. Thus 3.1.2 & 
3.1.1. 

b. To see in general 3.1.3=£> 3.1.2, we notice that if Xi, X 2 , Y satisfy 3.1.2, 
then 

7 s 'mXi) = S st = 1 S \Y\X 2 ) 

for all s,t 6 Dmn(F). There exist Xi,X 2 ,Y satisfying 3.1.3 but not 
3.1.2, i. e. , 

( 7 st m*x)) = (Y\Y\X 2 )) * (S st ) 



c. To see 3.1.4=£> 3.1.3 generally, we consider the following data set S. 



3 HIERARCHY OF EQUIVALENCE RELATIONS 11 



Y 


1 2 2 4 3 4 


x 2 


1 1 2 2 3 3 
1 3 2 3 1 2 


probability 


1/6 1/6 1/6 1/6 1/6 1/6 



Then 

1"(Y\X i ) = ± ^ = 1,2,3,4; € = 1,2, 

while 

7 12 (y|x 1 ) = i^o = 7 12 (x 2 ). 

d. To see 3.1.5=£» 3.1.4 generally, we consider the following data set S. 



Y 


1 


1 


2 


3 


1 


2 


3 


2 


Xi 


1 


1 


2 


3 


4 


1 


1 


4 


x 2 


2 


1 


1 


1 


4 


1 


3 


4 


probability 


1/10 


2/10 


1/10 


1/10 


1/10 


2/10 


1/10 


1/10 



Then 

T Y\X! _ T Y\X 2 _ JL 
25' 

( & Y=l\ Xl e Y=2\ Xl 9 Y=3\ Xl) = (I II 23) 
v ' ' 1 v 6 72 48 ; 

{0 Y=1\X 2 Y=2\X 2 d Y=3\X 2) = (II I 23)_ 

v ' ' 1 v 72 6 48 ; 

e. If we replace E-2 equivalence with E-2', where E-2' is defined as 11 X\ 
and X 2 are E-2' equivalent if t Xi ^ X2 = 1 = T x z\ x i v ^ then the stronger- 
to-weaker chain that 

E-l ==> E-2' ==> E-3 ==> E-4 => E-5 (but not vice versa) 



still holds. 



12 W. HUANG, R. SHI AND X. WANG 

To see that E-2' implies E-3, notice that since r Xl|X2 = 1 = T x ^ Xl , 
we have that |Dmn(Xi)| = |Dmn(X2)| and for any event X\ = i there 
is a unique X2 = j such that P(X 2 = j\X\ = i) = 1 and vice versa. 
Assume that Dmn(Xi) = . . . ,ik}- Then Dmn(l2) = {ji, ■ ■ ■ ,jk} 
and we may and shall assume that 

P(X 2 = j q \X x =i g ) = l = P(X, = i q \X 2 = j g ), q = l,...,k. 

Thus 

«t (Yl Y , P(Xi = i q ,Y = s) P(X 1 = i q ,Y = t) 

P(X 2 = j q , Y = s) P(X 2 = j q , Y = t) 



q=l 
k 

^ P(X 2 =j q )P{Y = s) 

St 



= l st {Y\X 2 ). 

Thus X\ is E-3 equivalent to X 2 . It is easy to see in general E-3 
equivalence does not imply E-2' equivalence. 

f. E-l equivalence condition can be replaced with "if Ta 1 ^ 2 = t X2 ^ Xi = 

r a =1 for a regular weight vector a" . 

Actually, r* 1 '* 2 = 2,Xl = rJ |Xl = 1 if and only if t x ^ = t x ^ = 
T Y\ Xl = L 

Similarly, E-2 equivalence condition can be replaced with "if t^} Xx = 1 = 
Ta^ 2 for a regular weight vector a" . 

The equivalence relations and the hierarchy are expected to lay a foun- 
dation for feature selection and prediction for categorical data. 

In clinical trials, direct /search marketing or risk management, one also of- 
ten faces the case of binary response variable. We shall see from the following 
theorem that the five relations actually degenerate to three. 



3 HIERARCHY OF EQUIVALENCE RELATIONS 13 

Theorem 3.4. If the response variable is dichotomous, say, Dmn(Y) = 
{0, 1}, then for any weight vector a 

(1) 0Y=s\X =T Y\X jS = QA . 

(2) with respect to Y , the E-i equivalence relations are the same for i = 
3,4,5. 

Proof. (1). It is a routine check. 
(2). One checks 



j n (Y\X) = P(Y = l) + FG(y)r " P 



2P{Y = 1) 

i"(Y\X) = P(Y = 2) + 



_ ^ , V G (X) rI lX 



2P(Y = 2) ' 

by (1), we have j n (Y\X) = ^y 22 (Y\X). Notice that 7(y|x) is a two-by-two 
row stochastic matrix. So the matrix is uniquely determined by t& . □ 

This theorem implies that T a (in particular, the GK r) is enough in the 
case of binary response variable. 

The following theorem tells us about what the joint distribution (X\, X2) 
looks like if X\ and X2 are E-2 equivalent with respect to Y . 

Theorem 3.5. X\ and X2 are E-2 equivalent w. r. t. Y if and only if 
Y is completely determined by X\ or X2; and in this case, there exist hard 
partitions Par(Xi) := {Xf\s £ Dmn(Y)} of Dmn(Xi), i = 1,2, where each 
Xf consists of some scenarios of X{ , such that 

{{ai,a2)\a\ G Dmn(Xl),ci2 € X^} = whenever s ^ t; 

while 

{(ai,a 2 )\a 1 € Dmn(Xl),a 2 G L>mn(Xf)} ^ 
for all s € Dmn(Y) . 



14 W. HUANG, R. SHI AND X. WANG 

Proof. Without loss of generality, we may and shall assume Dmn(y) = 
{1, 2, . . . , ny}. Since r Y \ Xi = 1, the pairs of (Xj, Y) in S defines naturally a 
deterministic surjective function /j : Dmn(Xj) — » Dmn(7) so that P(Y = 
f(a,i)\Xi = Oj) = 1. Thus Xf := f[^ 1 (p), p = 1, . . . , ny, defines a partition 
of Dmn(Xj). We want to show that 

S(p,p) := {( ai ,a 2 )\ ai G f-\p), i = 1,2} + 0, 

while when 1 < p ^ q < ny, 

S(p,q) := {{a u a 2 )\ ai G f{\p), a 2 G fr\q)} = 0. 

In fact, let a\ G f\ l {p)- Then there exists a 2 G Dmn(X2) such that (ai, a 2 )P) G 
Iixl 2 x y. Thus a 2 G / 2 -1 (p). Hence / 0. By the arbitr ariness of 

ai in fi (p), we see that 5(p, g) = 0, when p ^ q. 
The above argument also shows that 

X x x X 2 = U peDmn(y) 5(p,p). 

This completes proof. □ 

4. Examples and Data Analysis. To illustrate, we present two ex- 
amples including analysis results from a credit risk management data set. 

Example 4.1. We first consider a data set with 23,370 observations. 
The response variable has 6 scenarios with the following probability distri- 
bution: 

p(Y) = ( .1048, .3083, .3062, .1563, .1092, .0142 ). 

(i) To demonstrate, we generate another categorical variables which is in- 
dependent of the response variable. The generated explanatory variable has 
6 categories. Since the response and explanatory variables are independent, 



4 EXAMPLES AND DATA ANALYSIS 



15 



X\Y 


1 


2 


3 


4 


5 


6 


(x,-) 


1 


788 


183 














971 


2 


1089 


4358 


3006 


800 


160 





9412 


3 


320 


1665 


2544 


1558 


1101 


55 


7242 


4 


131 


559 


949 


746 


660 


97 


3142 


5 


92 


363 


583 


493 


522 


141 


2194 


6 


29 


78 


75 


78 


109 


40 


409 




2449 


7205 


7156 


3675 


2552 


333 


23370 



Table 1 
Joint frequency table. 



every component of the association vector should also be zero in theory. 
The global a-association degree should be zero for any given weight vector 
a. The estimated association vector is calculated to be 

6 = (2 x 1(T 3 , 2 x 1(T 3 , 3 x 10~ 3 , 10~ 3 , 5 x 10~ 3 , 3 x 1(T 3 ). 

(ii) Next we select an actual explanatory variable contained in the same 
data set. The joint frequency table is given in Table 4.1. The GK r and the 
association vector are shown as follows: 

t = .0763; 6 = (.2437, .0778, .0236, .0413, .0806, .0355). 

This shows that the selected actual explanatory variable has some demon- 
strated mild association with the response variable. 

(iii) In order to demonstrate the properties of the association matrix, we 
randomly select 80% of the observations and put them into the training set. 
The rest is set aside for test set. 

The association vectors restricted to the training and test sets are calcu- 
lated: 

Qtrain = (.2348, .1369, .0457, .0374, .0500, .0158); 
9 iest = (.2502, .1410, .0532, .0336, .0608, .0121). 



16 W. HUANG, R. SHI AND X. WANG 

By using the same proportional prediction principle described in Good- 
man and Krusal [7] , one can make a guess on the category of the response 
variable for each observation based on the conditional distributions, and 
then obtain the generalized confusion matrix. 

The association matrix based on training (left) and the generalized con- 
fusion matrix on validation (right) are given as: 





.26 


.48 


.15 


.06 


.04 


.01 






( 


.23 


.50 


.18 


.04 


.04 


.01 






.05 


.49 


.28 


.11 


.07 


.01 








.05 


.48 


.27 


.11 


.07 


.01 






.02 


.37 


.34 


.15 


.11 


.02 




vs 




.01 


.35 


.35 


.16 


.11 


.02 






.02 


.32 


.35 


.17 


.12 


.02 








.02 


.33 


.33 


.17 


.11 


.03 






.02 


.30 


.34 


.17 


.14 


.03 








.01 


.28 


.37 


.16 


.13 


.04 




I 


.03 


.28 


.33 


.18 


.15 


.03 


) 




\ 


.06 


.24 


.33 


.18 


.18 


.01 


) 



It can be seen that the generalized confusion matrix by using the test set 
is very close to the association matrix by using the training set. Since both 
matrices are row-stochastic matrices, we test the hypothesis that the two 
matrices form two identical distributions. The hypothesis was not rejected 
since the p- value is very close to 1. 



Example 4.2. We now consider a real loan application data discussed 
used in [12] or [10]. This data set has several variables and 650 records. For 
simplicity, we are only concerned about these five (categorical or discretized) 
variables: On-Time, Age, Income, Credit and Risk, where each variable was 
categorized as On-Time=(No (0), Yes (1)); Age=(young, med, sen); Income 
= (low, mid, hi); Risk =(low, med, hi); Credit = (red, yellow, green). We 
consider three situations in which On-Time, Risk, and Credit are used as 
the response variable respectively. 



4 EXAMPLES AND DATA ANALYSIS 17 

1. For response Y = On-Time, we observe that p(Y) = (0.1,0.9). Since 
Y is binary, by Theorem 3.4, t^} X = t y ^ x = Q Y=l \ x \i = 0, 1. 



X 


Credit 


Risk 


Age 


Income 


T Y\X 


.0577 


.0486 


.0402 


.0134 



2. For response Y = Risk p(Y) = (.4877, .0400, .4723), we obtain the fol- 
lowing results: 



X 


T Y\X 


Q y\x 


j(Y\X) 


(X,Y) freq. 


On-Time 


.0432 


(.0451, .0002,.0479) 


( .5108 .0407 .4485 \ 

.4959 .0402 .4639 
V .4631 .0393 .4976 / 


/II 2 52 \ 
V 306 24 255 > 


Age 


.5137 


(.5451, .0018,-5611) 


/ .7669 .0437 .1894 \ 

.5324 .0417 .4258 
V .1956 .0361 .7684/ 


( 13 9 246 \ 

291 17 61 
V 13 / 


Income 


.0272 


(.0368,.0207,.0185) 


/ .5065 .0345 .459 \ 

.4206 .0599 .5195 
V .4739 .044 .4821 / 


/ 19 8 45 \ 

211 17 209 
V 87 1 53 / 


Credit 


.0009 


(.0006,-0008,-0012) 


/ .488 .0401 .4719 \ 

.4892 .0408 .4700 
V .4872 .0398 .4729 / 


/ 35 2 40 \ 

98 9 93 
V 184 15 174 / 



3. For response Y = Credit: p(-, Y) = (.1185, .3077, .5738), we obtain the 
following results: 



X 


T Y\X 


m 


y(Y\X) 


(X,Y) freq. 


On-Time 


.0319 


(.0322, .0123, .0488) 


( .1468 .3328 .5204 ^ 

.1281 .3162 .5556 
V .1074 .2979 .5946 ) 




I 19 30 16 N 
I 58 170 357 ) 


Age 


.0035 


(.0099, .0028, .0014) 


( .1272 .3023 .5705 > 

.1164 .3096 .5740 
V .1178 .3078 .5744 / 




/40 80 148 N 

34 118 217 
V 3 2 8 j 


) 


Income 


.001 


(.0007, .0006, .0016) 


/ .1191 .3085 .5724 > 

.1188 .3081 .5731 
V .1182 .3073 .5745 / 




( 7 20 45 
54 137 246 
V 16 43 82 , 


) 


Risk 


.0005 


(.0016,.0003,.0002) 


/ .1199 .3069 .5733^1 

.1181 .3079 .5739 
V .1183 .3077 .5739 / 




( 35 98 184 \ 

2 9 15 
V 40 93 174 / 





The variable Risk was generated by a seemingly subjective discretization 
on the ratios of debt over asset and set to reflect the degree of risk of the 
loan or borrower. Calculations have shown that Risk and Credit are almost 
independent of each other. Moreover, the variable On-Time is quite lowly 
associated with each of the two variables. 



18 W. HUANG, R. SHI AND X. WANG 

We think that there are two main reasons: (1) either the credit scoring or 
the risk assigned is in poor quality; or even both are in poor quality. In this 
case, the continuous variable should be properly discretized. (2) The existing 
categorized Risk is a subjective or conventional classification of the debt- 
over-asset ratios. We can see that this classification is almost meaningless 
for the loan risk management. 

5. Discussions and Future Work. We introduce an association vec- 
tor, a class of global association degrees based on the association vector and 
weight vectors, and an association matrix. We also study the equivalence 
relations induced by the association measures introduced. 

The association vector measures both local-on-global and global-on-global 
nominal cross-classification dependence. One may directly see the expected 
proportional prediction's lifts for each value of the response variable. The 
association vector is essentially equivalent with the diagonal the associa- 
tion matrix. Various global associations can be derived from the association 
vector with various weights. Our subsequent work on feature selection algo- 
rithms will be based on the class of global association measures proposed in 
this article. 

The association matrix based on proportional classification gives rise to 
the accuracy rate and error rate distribution. It is an extension of the con- 
fusion matrix widely used in classification. We expect more research on rel- 
evant statistical inference and variations of the association matrix in the 
future. 

The hierarchy helps deepen understanding of local and global statistical 
association among variables and the structure of a multivariate distribution. 
The equivalence relations are expected to play a crucial role in analysis of 



REFERENCES 



19 



high dimensional categorical data when the populations are not known and 
sample sizes are small. 

References. 

[1] Agresti. A. (2002). Categorical Data Analysis, John Wiley, New York. 
[2] Costener, H.L. (1965). Criteria for measure of association, American Sociological 
Review", 30, 341-353. 

[3] Fienberg, S. (2007). The Analysis of Cross- Classified Categorical Data, Springer, New 
York. 

[4] Gini, C.W. (1971) Variability and Mutability, contribution to the study of statis- 
tical distributions and relations, Studi Economico-Giuridici della R. Universita de 
Cagliari (1912). Reviewed in: Light, R.J., Margolin, B.H.: An Analysis of Variance 
for Categorical Data. Journal of the American Statistical Association, 66, 534-544. 

[5] Goodman, L.A. (1996). A Single General Method for the Analysis of Cross-Classified 
Data: Reconcilliation and Synthesis of Some Methods of Pearson, Yule and Fisher, 
and Also Some Methods of Correpondence Analsysis and Association Analysis, The 
Journal of the American Statistical Association, 91, 408-428. 

[6] Goodman, L.A. (2000). The analysis of cross-classified data: notes on a century of 
progress in contingency table analysis, and some comments on its prehistory and its 
future, Statistics for the 21st Century, editors: Rao, C.R. and Szekely, G. J., 189-231, 
Marcel Dekker. 

[7] Goodman, L.A. and Kruskal, W. H. (1954). Measures of Associations for Cross clas- 
sification, The Journal of the American Statistical Association, 49, 732-764. 
[8] Lloyd, C. J. (1999). Statistical analysis of categorical data, John Wiley & Sons. 
[9] Micheli-Tzanakou, E. (1999). Supervised and unsupervised pattern recognition: fea- 
ture extraction and computational, CRC Press. 
[10] Olson, D. and Shi, Y. (2007). Introduction to business data mining, McGraw-Hill. 
[11] Sarndal, C.E. (1974). A comparative study of association measures, Psychometrika, 
39, 165-187. 

[12] Seppanen, M. S., Kumar S. and Chandra, C. (2004) Process Analysis and Improve- 
ment: Tools and Techniques, McGraw-Hill Higher Education. 



20 



REFERENCES 



Wenxue Huang 



Yong Shi 



Department of Mathematics 



Research Center for Fictitious Economics and Data Science 



Shantou University 



Chinese Academy of Science 



P.R. China 



P.R. China 



E-MAIL: wxhuang@stu.cdu.cn 



E-MAIL: yshi@gucas.ac.cn 



Xiaogang( Steven) Wang 

Department of Mathematics and Statistics 

York University 

Canada 

E-MAIL: stevenw@mathstat.yorku.ca 



